51 research outputs found

    A proposed data fusion architecture for micro-zone analysis and data mining

    Full text link
    Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture

    Building Energy Load Forecasting using Deep Neural Networks

    Full text link
    Ensuring sustainability demands more efficient energy management with minimized energy wastage. Therefore, the power grid of the future should provide an unprecedented level of flexibility in energy management. To that end, intelligent decision making requires accurate predictions of future energy demand/load, both at aggregate and individual site level. Thus, energy load forecasting have received increased attention in the recent past, however has proven to be a difficult problem. This paper presents a novel energy load forecasting methodology based on Deep Neural Networks, specifically Long Short Term Memory (LSTM) algorithms. The presented work investigates two variants of the LSTM: 1) standard LSTM and 2) LSTM-based Sequence to Sequence (S2S) architecture. Both methods were implemented on a benchmark data set of electricity consumption data from one residential customer. Both architectures where trained and tested on one hour and one-minute time-step resolution datasets. Experimental results showed that the standard LSTM failed at one-minute resolution data while performing well in one-hour resolution data. It was shown that S2S architecture performed well on both datasets. Further, it was shown that the presented methods produced comparable results with the other deep learning methods for energy forecasting in literature

    An Adversarial Approach for Explainable AI in Intrusion Detection Systems

    Full text link
    Despite the growing popularity of modern machine learning techniques (e.g. Deep Neural Networks) in cyber-security applications, most of these models are perceived as a black-box for the user. Adversarial machine learning offers an approach to increase our understanding of these models. In this paper we present an approach to generate explanations for incorrect classifications made by data-driven Intrusion Detection Systems (IDSs). An adversarial approach is used to find the minimum modifications (of the input features) required to correctly classify a given set of misclassified samples. The magnitude of such modifications is used to visualize the most relevant features that explain the reason for the misclassification. The presented methodology generated satisfactory explanations that describe the reasoning behind the mis-classifications, with descriptions that match expert knowledge. The advantages of the presented methodology are: 1) applicable to any classifier with defined gradients. 2) does not require any modification of the classifier model. 3) can be extended to perform further diagnosis (e.g. vulnerability assessment) and gain further understanding of the system. Experimental evaluation was conducted on the NSL-KDD99 benchmark dataset using Linear and Multilayer perceptron classifiers. The results are shown using intuitive visualizations in order to improve the interpretability of the results

    Improving cyber-security of smart grid systems via anomaly detection and linguistic domain knowledge

    Full text link
    The planned large scale deployment of smart grid network devices will generate a large amount of information exchanged over various types of communication networks. The implementation of these critical systems will require appropriate cyber-security measures. A network anomaly detection solution is considered in this work. In common network architectures multiple communications streams are simultaneously present, making it difficult to build an anomaly detection solution for the entire system. In addition, common anomaly detection algorithms require specification of a sensitivity threshold, which inevitably leads to a tradeoff between false positives and false negatives rates. In order to alleviate these issues, this paper proposes a novel anomaly detection architecture. The designed system applies the previously developed network security cyber-sensor method to individual selected communication streams allowing for learning accurate normal network behavior models. Furthermore, the developed system dynamically adjusts the sensitivity threshold of each anomaly detection algorithm based on domain knowledge about the specific network system. It is proposed to model this domain knowledge using Interval Type-2 Fuzzy Logic rules, which linguistically describe the relationship between various features of the network communication and the possibility of a cyber attack. The proposed method was tested on experimental smart grid system demonstrating enhanced cyber-security

    Information gain based dimensionality selection for classifying text documents

    Full text link
    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

    WESBES: A wireless embedded sensor for improving human comfort metrics using temporospatially correlated data

    Full text link
    When utilized properly, energy management systems (EMS) can offer significant energy savings by optimizing the efficiency of heating, ventilation, and air-conditioning (HVAC) systems. However, difficulty often arises due to the constraints imposed by the need to maintain an acceptable level of comfort for a building’s occupants. This challenge is compounded by the fact that human comfort is difficult to define in a measurable way. One way to address this problem is to provide a building manager with direct feedback from the building’s users. Still, this data is relative in nature, making it difficult to determine the actions that need to be taken, and while some useful comfort correlations have been devised, such as ASHRAE’s Predicted Mean Vote index, they are rules of thumb that do not connect individual feedback with direct, diverse feedback sensing. As they are a correlation, quantifying effects of climate, age of buildings and associated defects such as draftiness, are outside the realm of this correlation. Therefore, the contribution of this paper is the Wireless Embedded Smart Block for Environment Sensing (WESBES); an affordable wireless sensor platform that allows subjective human comfort data to be directly paired with temporospatially correlated objective sensor measurements for use in EMS. The described device offers a flexible research platform for analyzing the relationship between objective and subjective occupant feedback in order to formulate more meaningful measures of human comfort. It could also offer an affordable and expandable option for real world deployment in existing EMS

    Information Gain Based Dimensionality Selection for Classifying Text Documents

    Get PDF
    Abstract-Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods
    • …
    corecore